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Genomic surveys in humans identify a large amount of recent 
positive selection. Using the 3.9-million HapMap SNP dataset, we 
found that selection has accelerated greatly during the last 40,000 
years. We tested the null hypothesis that the observed age distri- 
bution of recent positively selected linkage blocks is consistent 
with a constant rate of adaptive substitution during human evo- 
lution. We show that a constant rate high enough to explain the 
number of recently selected variants would predict (i) site het- 
erozygosity at least 10-fold lower than is observed in humans, (ii) 
a strong relationship of heterozygosity and local recombination 
rate, which is not observed in humans, (iif) an implausibly high 
number of adaptive substitutions between humans and chimpan- 
zees, and (iv) nearly 100 times the observed number of high- 
frequency linkage disequilibrium blocks. Larger populations gen- 
erate more new selected mutations, and we show the consistency 
of the observed data with the historical pattern of human popu- 
lation growth. We consider human demographic growth to be 
linked with past changes in human cultures and ecologies. Both 
processes have contributed to the extraordinarily rapid recent 
genetic evolution of our species. 
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ial uman populations have increased vastly in numbers during 
the past 50,000 years or more (1). In theory, more people 
means more new adaptive mutations (2). Hence, human popu- 
lation growth should have increased in the rate of adaptive 
substitutions: an acceleration of new positively selected alleles. 

Can this idea really describe recent human evolution? There 
are several possible problems. Only a small fraction of all 
mutations are advantageous; most are neutral or deleterious. 
Moreover, as a population becomes more and more adapted to 
its current environment, new mutations should be less and less 
likely to increase fitness. Because species with large population 
sizes reach an adaptive peak, their rate of adaptive evolution over 
geologic time should not greatly exceed that of rare species (3). 

But humans are in an exceptional demographic and ecological 
transient. Rapid population growth has been coupled with vast 
changes in cultures and ecology during the Late Pleistocene and 
Holocene, creating new opportunities for adaptation. The past 
10,000 years have seen rapid skeletal and dental evolution in 
human populations and the appearance of many new genetic 
responses to diets and disease (4). 

In such a transient, large population, size increases the rate 
and effectiveness of adaptive responses. For example, natural 
insect populations often produce effective monogenic resistance 
to pesticides, whereas small laboratory populations under similar 
selection develop less effective polygenic adaptations (5). Che- 
mostat experiments on Escherichia coli show a continued re- 
sponse to selection (6), with continuous and repeatable re- 
sponses in large populations but variable and episodic responses 
in small populations (7). These results are explained by a model 
in which smaller population size limits the rate of adaptive 
evolution (8). A population that suddenly increases in size has 
the potential for rapid adaptive change. The best analogy to 
recent human evolution may be the rapid evolution of domes- 
ticates such as maize (9, 10). 
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Human genetic variation appears consistent with a recent accel- 
eration of positive selection. A new advantageous mutation that 
escapes genetic drift will rapidly increase in frequency, more quickly 
than recombination can shuffle it with other genetic variants (11). 
As a result, selection generates long-range blocks of linkage dis- 
equilibrium (LD) across tens or hundreds of kilobases, depending 
on the age of the selected variant and the local recombination rate. 
The expected decay of LD with distance surrounding a recently 
selected allele provides a powerful means of discriminating selec- 
tion from other demographic causes of extended LD, such as 
bottlenecks and admixture (9, 12). 

The important reason for this increase in discrimination is the 
vastly different genomic scale that LD-based approaches use 
compared with previous methods (scales of millions of bases 
rather than thousands of bases). LD methods use polymorphism 
distance and order information and frequency to search for 
selection, unlike all previous methods (9, 12). Previous methods, 
therefore, have difficulty defining selection unambiguously from 
other population architectures on the kb scale usually examined. 
On the megabase (Mb) scale examined by LD approaches, 
however, extensive modeling and simulations indicate that other 
demographic causes of extensive LD can be discriminated easily 
from those caused by adaptive selection (9). Further, current LD 
approaches restrict comparisons to a set of frequencies and 
inferred allele ages for which neutral explanations are essentially 
implausible. 

Previously, we applied the LD decay (LDD) test to SNP data 
from Perlegen and the HapMap (13), finding evidence for recent 
selection on ~1,800 human genes. We refer to these as ascer- 
tained selected variants (ASVs). The probabilistic LDD test 
searches for the expected decay of adjacent SNPs surrounding a 
recently selected allele. Importantly, the method is insensitive to 
local recombination rate, because local rate influences the extent 
of LD surrounding both alleles, while the method looks for LD 
differences between alleles. Further, the method relies only on 
high heterozygosity SNPs for analysis, exactly the type of data 
obtained for the HapMap project. 

The number of ASVs detected encompasses some 7% of 
human genes and is consistent with the proportion found in 
another survey using a related approach (12). Because LD 
decays quickly over time, most ASVs are quite recent (14), 
compared with other approaches that detect selection over 
longer evolutionary time scales (15, 16). Many human genes are 
now known to have strongly selected alleles in recent historical 
times, such as lactase (17, 18), CCRS5 (19, 20), and FY (21). These 
surveys show that such genes are very common. This observation 
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is surprising: in theory, such strongly selected variants should be 
rare (2, 3). The observed distribution seems to reflect an 
exceptionally rapid rate of adaptive evolution. 

But the hypothesis that genomic data show a high recent rate 
of selection must overcome two principal objections: (7) The 
LDD test might miss older selection and (i) a high constant rate 
of adaptive substitution might also explain the large number of 
ASVs. The first objection is addressed by recalculating the LDD 
test on a 3-fold larger dataset, because higher SNP density is 
needed to detect older selected alleles with comparable sensi- 
tivity. We test the second objection by considering a constant 
rate as the null hypothesis then deriving and testing genomic 
consequences. 


Results 


Finding Old Alleles. The original Perlegen and HapMap datasets 
were relatively small (1.6 million and 1.0 million SNPs, respec- 
tively). The low SNP density limited the power of LD methods 
to detect older selection events, particularly in high- 
recombination areas of the genome (9). Likewise, a related study 
of selection (12) was biased toward newer alleles by requiring 
multiple adjacent SNPs to exhibit extended LD. Older selected 
alleles, where LDD is more rapid, would be rejected with this 
approach. Neither of those previous studies (9, 12) attempted to 
quantitate the numbers of selected events over an extended time 
frame, but were merely initial searches for recent extended LD 
at individual alleles, the most sensitive method to detect recent 
adaptive change. Both found abundant evidence for recent 
selection. 

Therefore, we have now recomputed the LDD test on the 
newly released 3.9-million HapMap genotype dataset (13). By 
varying the LDD test search parameters, we can now statistically 
detect alleles with more rapid LDD (and hence older inferred 
ages) (9). For all parameters used, the detection threshold was 
set at an average log likelihood (ALnLH) > 2.6 SD (=99.5th 
percentile) from the genome average. Again, this LDD threshold 
is a stringent cutoff for the detection of genomic outliers, 
because the high number of selective events are included in the 
genome average (9). The probabilistic LDD test does not require 
the calculation of inferred haplotypes (9), so it is not a daunting 
computational task to calculate ALnLH values for the HapMap 
3.9 million SNPs genotyped in 270 individuals: 90 European 
ancestry (CEU), 90 African (Yoruba) ancestry (YRIJ), 45 Han 
Chinese (CHB), and 45 Japanese (JPT). 

This analysis uncovered only 12 new SNPs (in six clusters) not 
originally detected in the CEU population (9) and 466 new SNPs 
representing 206 independent clusters in the YRI population. A 
total of 2,803 (CEU), 2,367 (CHB), 2,783 (JPT), and 3,486 
(YRI) selection events were found. As noted (9), many inferred 
selected sites have faster LDD in YRI samples (with older 
coalescence times), resulting in lower background LD and more 
previously unobserved variants. The denser HapMap dataset 
provided better resolution of LDD (i.e., rapid decay can be 
reliably detected from background LD only with high density). 
The 3.9-million HapMap dataset discovered more ASVs, but 
only an incremental increase in the CEU and a (~7%) increase 
in YRI values. This finding indicates that most events (defined 
by the LDD test) coalescing to ages up to 80,000 years ago have 
been detected, and any ascertainment bias against older selec- 
tion is very slight within the given frequency range. 

Ancient selected alleles are also more likely to be near or at 
fixation than recent alleles. Just as we excluded rare alleles, we 
also excluded high-frequency alleles (i.e., >78%) in our age 
distribution. But the number of such high-frequency alleles 
provides another test of the hypothesis that the LDD test has 
missed older events. We modified the LDD test to find these 
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high-frequency “near-fixed” alleles and found only 50 candi- 
dates. Other studies have likewise found few near-fixed alleles 
(22, 23). These studies also show that very few ASVs are shared 
between HapMap samples; most are population-specific (9, 12). 
In our data, only 509 clusters are shared between CEU and YRI 
samples; many of these are likely to have been under balancing 
selection [supporting information (SI) Appendix]. The small 
number of near-fixed events and the small number of shared 
events are strong evidence that the LDD test has not missed a 
large number of ancient selected alleles. 


Allele Ages. We used a modification of described methods (24-26) 
to estimate an allele age (coalescence time) for each selected 
cluster. We focused on the HapMap populations with the largest 
sample sizes, which were the YRI and CEU samples. Similar results 
were obtained for the CHB and JPT populations (data not shown). 

Fig. 1 presents histograms of these age estimates. The YRI 
sample shows a modal (peak) age of ~8,000 years ago, assuming 
25-year generations; the CEU sample shows a peak age of 
~5,250 years ago, both values consistent with earlier work (9, 
12). The difference in peak age likely explains why weaker tests 
have found stronger evidence of selection in European ancestry 
samples (27, 28), unlike the current study. 


Rate Estimation. Using the diffusion model of positive selection 
(29), we estimated the adaptive substitution rate consistent with 
the observed age distribution of ASVs. For the YRI data, this 
estimate is 0.53 substitutions per year. For the CEU data, this 
estimate is 0.59 substitutions per year. The average fitness 
advantage of new variants (assuming dominant effects) is esti- 
mated as 0.022 for the YRI distribution and 0.034 for the CEU 
distribution. Curves obtained by using these estimated values fit 
the observed data well (Fig. 1). The higher estimated rate for 
Europeans emerges from the more recent modal age of variants. 
For further analyses, we used the lower rate estimated from the 
YRI sample as a conservative value. 


Predictions of Constant Rate. We can derive four predictions from 
the rate of adaptive substitution, each of which refutes the null 
hypothesis of constant rate: 


1. The null hypothesis predicts that the average nucleotide 
diversity across the genome should be vastly lower than 
observed. Recurrent selected substitutions greatly reduce the 
diversity of linked neutral alleles by hitchhiking or 
pseudohitchhiking (30, 31). Using an approximation for site 
heterozygosity under pseudohitchhiking (30, 32) we esti- 
mated the expected site heterozygosity under the null hy- 
pothesis as 3.5 X 10—> (SI Appendix). This value is less than 
one-tenth the observed site heterozygosity, which is between 
4.0 and 6.0 X 10~4 in human populations (13, 33, 34). 

2. Hitchhiking is more important in regions of low recombina- 
tion, so the null hypothesis predicts a strong relationship 
between nucleotide diversity and local recombination rate. 
The null hypothesis predicts a 10-fold increase in diversity 
across the range of local recombination rates represented by 
human gene regions. Empirically, diversity is slightly corre- 
lated with local recombination rate, but the relationship is 
weak and may be partly explained by mutation rate (13, 35). 

3. The annual rate of 0.53 adaptive substitutions consistent with 
the YRI data predicts an implausible 6.4 million adaptive 
substitutions between humans and chimpanzees. In contrast, 
there are only ~40,000-aa substitutions separating these 
species, and only ~18 million total substitutions (36). This 
amount of selection, amounting to >1/3 of all substitutions, 
or 100 times the observed number of amino acid substitutions, 
is implausible. 
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Age distribution of ascertained selected alleles. Each point represents the number of variants dated to a single 10-generation bin. Fitted curves are the 


number of ascertained variants predicted by Eq. 2 under a constant population size and constant 5 = 0.022 for YRI and 5 = 0.034 for CEU. The distribution drops 
to zero approaching the present, because all alleles have frequencies >22% today. The 2,965 (YRI) and 2,246 (CEU) selection ages shown have had 509 alleles 
removed that are likely examples of ongoing balanced selection (S/ Appendix). Including these alleles in the analysis does not change the overall conclusion of 


acceleration of selection. 


4. The null hypothesis predicts that many selected alleles should 
be found between 78% and 100% frequency. Positively 
selected alleles follow a logistic growth curve, which proceeds 
very rapidly through intermediate frequencies. Because se- 
lected alleles spend relatively little time in the ascertainment 
range, the ascertained blocks should be the “tip of the 
iceberg” of a larger number of recently selected blocks at or 
near fixation. For example, the ASVs in the YRI dataset have 
a modal age of ~8,000 years ago. Based on the diffusion 
model for selection on an additive gene, ascertained variants 
should account for only 18% of the total number of selected 
variants still segregating. In contrast, 41% of segregating 
variants should be >78%. Dominant alleles (which have a 
higher fixation probability) progress even more slowly 
(>78%), so that additivity is the more conservative assump- 
tion. Empirically, few such near-fixed variants with high LD 
scores have been found in the human genome (13). Modifying 
the LDD algorithm to specifically search for high-frequency 
“fixed” alleles found only 50 potential sites, in contrast to the 
>5,000 predicted by the constant rate model. Although it is 
possible that the rapid LDD expected for older selected 
alleles near fixation may not be detected as efficiently by the 
LDD test, two other surveys have also found small numbers 
of such events (22, 23). This difference of two orders of 
magnitude is a strong refutation of the null hypothesis. 


Population Growth. The rate of adaptive evolution in human 
populations has indeed accelerated within the past 80,000 years. 
The results above demonstrate the extent of acceleration: the 
recent rate must be one to two orders of magnitude higher than 
the long-term rate to explain the genomewide pattern. 
Population growth itself predicts an acceleration effect, be- 
cause the number of new mutations increases as a linear product 
of the number of individuals (2), and exponential growth in- 
creases the fixation probability of new adaptive mutations (37). 
We considered the hypothesis that the magnitude of human 
population growth might explain a large fraction of the recent 
acceleration of new adaptive alleles. To test this hypothesis, we 
constructed a model of historic and prehistoric population 
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growth, based on historical and archaeological estimates of 
population size (1, 38, 39). 

Population growth in the Upper Paleolithic and Late Middle 
Stone Age began by 50,000 years ago. Several archaeological 
indicators show long-term increases in population density, in- 
cluding more small-game exploitation, greater pressure on easily 
collected prey species like tortoises and shellfish, more intense 
hunting of dangerous prey species, and occupation of previously 
uninhabited islands and circumarctic regions (40). Demographic 
growth intensified during the Holocene, as domestication cen- 
ters in the Near East, Egypt, and China underwent expansions 
commencing by 10,000 to 8,000 years ago (41, 42). From these 
centers, population growth spread into Europe, North Africa, 
South Asia, Southeast Asia, and Australasia during the succeed- 
ing 6,000 years (42, 43). Sub-Saharan Africa bears special 
consideration, because of its initial large population size and 
influence on earlier human dispersals (44). Despite the possible 
early appearance of annual cereal collection and cattle hus- 
bandry in North Africa, sub-Saharan Africa has no archaeolog- 
ical evidence for agriculture before 4,000 years ago (42). West 
Asian agricultural plants like wheat did poorly in tropical sun and 
rainfall regimes, while animals faced a series of diseases that 
posed barriers to entry (45). As a consequence, some 2,500 years 
ago the population of sub-Saharan Africa was likely <7 million 
people, compared with European, West Asian, East Asian, and 
South Asian populations approaching or in excess of 30 million 
each (1). At that time, the sub-Saharan population grew at a high 
rate, with the dispersal of Bantu populations from West Africa 
and the spread of pastoralism and agriculture southward through 
East Africa (46, 47). Our model based on archaeological and 
historical evidence includes large long-term African population 
size, gradual Late Pleistocene population growth, an early 
Neolithic transition in West Asia and Europe, and a later rise in 
the rate of growth in sub-Saharan Africa coincident with agri- 
cultural dispersal (Fig. 2). 

As shown in Fig. 3, the demographic model predicts the recent 
peak ages of the African and European distributions of selected 
variants, at a much lower average selection intensity than the 
constant population size model. In particular, the demographic 
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Fig. 2. 
and the earlier Neolithic growth in core agricultural areas. 


model readily explains the difference in age distributions be- 
tween YRI and CEU samples: the YRI sample has more variants 
dating to earlier times when African populations were large 
compared with West Asia and Europe, whereas earlier Neolithic 
growth in West Asia and Europe led to a pulse of recent variants 
in those regions. The data that falsify the constant rate model, 
such as the observed genomewide heterozygosity value and the 
probable number of human—chimpanzee adaptive substitutions, 
are fully consistent with the demographic model. 
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Fig. 3. Tip of the iceberg. Both the demographic and constant-rate models 
can account for the age distribution of ascertained variants (CEU data shown), 
but they differ greatly in the expected number of variants above the ascer- 
tainment frequency (fixed or near-fixed). The demographic model predicts a 
low long-term substitution rate and few alleles >78%, consistent with the 
observed data. 
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Historic and prehistoric population size estimates for human populations (S/ Appendix). Key features are the larger ancestral African population size 


Discussion 


Our simple demographic model explains much of the recent 
pattern, but some aspects remain. Although the small number of 
high-frequency variants (between 78% and 100%) is much more 
consistent with the demographic model than a constant rate of 
change, it is still relatively low, even considering the rapid 
acceleration predicted by demography. Demographic change 
may be the major driver of new adaptive evolution, but the 
detailed pattern must involve gene functions and gene- 
environment interactions. 

Cultural and ecological changes in human populations may 
explain many details of the pattern. Human migrations into 
Eurasia created new selective pressures on features such as skin 
pigmentation, adaptation to cold, and diet (25, 26, 28). Over this 
time span, humans both inside and outside of Africa underwent 
rapid skeletal evolution (48, 49). Some of the most radical new 
selective pressures have been associated with the transition to 
agriculture (4). For example, genes related to disease resistance 
are among the inferred functional classes most likely to show 
evidence of recent positive selection (9). Virulent epidemic 
diseases, including smallpox, malaria, yellow fever, typhus, and 
cholera, became important causes of mortality after the origin 
and spread of agriculture (50). Likewise, subsistence and dietary 
changes have led to selection on genes such as lactase (18). 

It is sometimes claimed that the pace of human evolution 
should have slowed as cultural adaptation supplanted genetic 
adaptation. The high empirical number of recent adaptive 
variants would seem sufficient to refute this claim (9, 12). It is 
important to note that the peak ages of new selected variants in 
our data do not reflect the highest intensity of selection, but 
merely our ability to detect selection. Because of the recent 
acceleration, many more new adaptive mutations should exist 
than have yet been ascertained, occurring at a faster and faster 
rate during historic times. Adaptive alleles with frequencies 
<22% should then greatly outnumber those at higher frequen- 
cies. To the extent that new adaptive alleles continued to reflect 
demographic growth, the Neolithic and later periods would have 
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experienced a rate of adaptive evolution >100 times higher than 
characterized most of human evolution. Cultural changes have 
reduced mortality rates, but variance in reproduction has con- 
tinued to fuel genetic change (51). In our view, the rapid cultural 
evolution during the Late Pleistocene created vastly more op- 
portunities for further genetic change, not fewer, as new avenues 
emerged for communication, social interactions, and creativity. 


Materials and Methods 


The 3.9-million HapMap release was obtained from the International HapMap 
Project website (www.hapmap.org). The LDD test (9) was applied to all four 
HapMap population datasets. Briefly, by examining individuals homozygous 
for a given SNP, the fraction of inferred recombinant chromosomes (FRC) at 
adjacent polymorphisms can be directly computed without the need to infer 
haplotype, a computationally daunting task on such large datasets. The test 
uses the expected increase with distance in FRC surrounding a selected allele 
to identify such alleles. Importantly, the method is insensitive to local recom- 
bination rate, because local rate will influence the extent of LD surrounding 
all alleles, while the method looks for LD differences between alleles. By using 
a large sliding window (ranging from 0.25 to 1.0 Mb in the current study), and 
by explicitly acknowledging the expected LD structure of selected alleles, the 
LDD test can distinguish selection from other population genetic/demo- 
graphic mechanisms, resulting in large LD blocks (9). 

A modification of the LDD test was conducted on the CEU and YRI datasets, to 
find selected alleles near fixation. Unlike the normal LDD test, all SNPs >78% 
frequency (the cutoff used for primary analysis of this data) were queried, using 
the same sliding windows as the normal test. Unlike the standard test, however, 
the requirement that the alternative allele be no more than 1 SD from the 
genome average was not implemented (9). Ninety-three clusters were identified 
in the CEU population and 85 were identified in the YRI population (with 65 
overlaps), a total of 113 fixed events. Unlike normal LDD screens (9), half of these 
observed fixed events determined by long-range LD were in extreme centromeric 
or telomeric regions, which have no recombination or high recombination, 
respectively (13, 52). The interpretation of extended LD in these regions is 
ambiguous, therefore, because low recombination maintains large LD blocks 
(centromeres), and well documented high telomere—telomere exchange homog- 
enizes these regions (52). Removing these centromeric and telomeric regions in 
which LD is likely to be the result of mechanisms different from selection yields 
~50 regions of potential fixation. 


Clustering. The LDD test produces ‘‘clusters’’ of SNPs with the signature of 
selection, because of the extensive LD surrounding these alleles (9). Each 
cluster is likely to represent a single selection event, and hence we have 
attempted to minimize potential overcounting by cluster analysis. Using a 
simple nearest-neighbor technique, we assign a 10-kb radius to each selected 
SNP. Each pass through the data produces a new set of centroids, and cluster 
membership is reassigned to the nearest centroid. A SNP that lies >20 kb away 
from the nearest centroid is considered a new cluster, with it being the sole 
member. Using larger window sizes (up to 100 kb) reduces the number of 
independent clusters (by approximately half), however, at the cost of ““fusing”’ 
likely independent events (data not shown). We believe the 10-kb window, 
therefore, is a conservative first-pass clustering of the observed selection 
events. 

Each selected SNP identified by the LDD test was sorted and mapped to its 
physical location on human chromosomes (University of California Santa Cruz 
Human Genome 17). We iterate through the SNP list, starting with the most 
distal, and a SNP and its closest neighbor (within 10 kb radius) are clustered 
together with a new centroid (average) / computed. To be included as part of 
the ith cluster, the next SNP on the sorted SNP list must fall within 20 kb of the 
ith cluster. If it is within 20 kb of both an upstream and downstream cluster, 
to be integrated in the /th cluster it must have a distance to the /th centroid 
closer than the next closest centroid (ij + 1). Otherwise, a new centroid and 
cluster is initiated. This task is repeated for all SNPs identified by the LDD test. 


Allele Age Calculations. Coalescence times (commonly referred to as allele 
ages) were calculated by methods described (24-26). Briefly, information 
contained in neighboring SNPs and the local recombination frequency is used 
to infer age. The genotyped population is binned (at the SNP under inferred 
selection, the target SNP) into the major and minor alleles (9). While every 
neighboring SNP gives information on the age of the target SNP, a single 
recombination event carries all of the downstream neighbors to an equal or 
higher FRC. Hence, our algorithm moves away (positively and negatively) from 
the target SNP and computes allele age only when a higher FRC level is reached 
in a neighboring SNP. A single neighboring SNP with no neighbors within 20 
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kb is not used for computation. This method is consistent with the theoretical 
and experimental expectations of LDD surrounding selected alleles (9). 
For neighboring SNPs, allele age is computed by using: 


1 x,y 

‘= In(1 — c) nf 2), [1] 
where t = allele age (in generations), c = recombination rate (calculated at the 
distance to the neighboring SNP), x¢ = frequency in generation t, and y = 
frequency on ancestral chromosomes. This method is a method-of-moments 
estimator (24), because the estimate results from equating the observed 
proportion of nonrecombinant chromosomes with the proportion expected if 
the true value of t is the estimated value. It requires no population genetic or 
demographic assumptions, only the exponential decay of initially perfect LD 
because of recombination. Estimates are obtained until FRC reaches 0.3, to 
avoid allele age calculations of lower reliability. We assume the ancestral 
allele is always the allele with neutral or genome average LDD ALnLH scores 
(9). Average regional recombination rates were obtained by querying data 
from ref. 53 in the University of California Santa Cruz database (http:// 
genome.ucsc.edu). Regions with <0.1 cM/Mb average recombination rate 
were excluded. All allele age estimates are averages of the individual calcu- 
lations at the target SNP (26). 





Estimating the Rate of Adaptive Substitutions. Under the null hypothesis of a 
constant rate of adaptive substitution, the age distribution of ASVs can 
estimate the mean fitness advantage (5) of new selected variants. The empir- 
ical distribution of fitness effects of adaptive substitutions is not known. On 
theoretical grounds, this distribution is expected to approximate a negative 
exponential (3). Other studies have assumed this distribution or a gamma 
distribution with similar shape (54-56), and selected mutations in laboratory 
organisms appear to fit this theoretical model (57, 58). In these expressions, s 
is the selection coefficient favoring a new mutation, and s is the mean 
selection coefficient among the set of all advantageous mutations. We assume 
that adaptive alleles are dominant in effect, which allows the highest fixation 
probability (59) and the most rapid increase in frequencies and is therefore 
conservative (less dominance requires a higher substitution rate to explain the 
observed distribution). The value of 5 is not known, and we are concerned with 
finding the single value that creates the best fit of the population size 
prediction to the observed data. We assumed a negative exponential distri- 
bution of s, in which Pr[s] = e~“S. The number of ascertained new adaptive 
variants originating in any single generation t is given by the equation: 


b 
4ANwv | se’ ds. [2] 


a 


Ny, asc — 


Here, vis the rate of adaptive mutations per genome per generation, and N; 
is the effective population size in generation t. This integral derives from the 
expectation of adaptive mutations in a diploid population (here, 2Nv) multi- 
plied by the fixation probability 2s for each, again assuming dominant fitness 
effect. Under the null hypothesis, the population size N; is constant across all 
generations, so the expected number of new adaptive mutations (ascertained 
and nonascertained) is likewise constant. 

We considered the range of s between value a, yielding a current mean 
frequency of 0.22, and value b, yielding a current mean frequency of 0.78, as 
derived from the diffusion approximation for dominant advantageous alleles 
(60). The parameter v is constant in effect across all generations, while the 
number of ascertained variants originating in each generation varies with the 
range of s placing new alleles in the ascertainment range. We applied a 
hill-climbing algorithm to find the best-fit value of 5 for the empirical distri- 
bution of block ages, allowing v to vary freely. With an estimate for s, the rate 
of adaptive mutations, v, can be estimated as the value that satisfies Eq. 2. This 
value is also sufficient to estimate the expected number of substitutions per 
generation, which is the value of the integral in Eq. 2 over the range 0 to 
infinity (in our analyses, the vast majority had 0.01 =s <0.1). For the YRI data, 
assuming dominant fitness effects, the resulting estimate of adaptive substi- 
tution rate is 13.25 per generation, or 0.53 per year. 
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